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We present an analytically solvable model for self-assembly of a molecular complex on a filament. 

The process is driven by a seed molecule that undergoes facilitated diffusion, which is a search 
strategy that combines diffusion in three-dimensions and one-dimension. Our study is motivated 
by single molecule level observations revealing the dynamics of transcription factors that bind to 
the DNA at early stages of transcription. We calculate the probability that a complex made up of 
a given number of molecules is completely formed, as well as the distribution of completion times, 
upon the binding of a seed molecule at a target site on the filament (without explicitly modeling 
the three-dimensional diffusion that precedes binding). We compare two different mechanisms of 
assembly where molecules bind in sequential and random order. Our results indicate that while the 
probability of completion is greater for random binding, the completion time scales exponentially 
with the size of the complex; in contrast, it scales as a power-law or slower for sequential binding, 
asymptotically. Furthermore, we provide model predictions for the dissociation and residence times 
of the seed molecule, which are observables accessible in single molecule tracking experiments. 

PACS numbers: 05.20.Dd, 82.20.Fd, 87.10.Mn 


I. INTRODUCTION 

Many biochemical processes involve formation of meso¬ 
scopic molecular structures that perform complex tasks. 
One well-known example is the transcription complex 
which plays the key role in accessing the information 
coded in the DNA [1]. During the process of transcrip¬ 
tion, a complex consisting of RNA polymerase and tran¬ 
scription factors is assembled on the DNA in order to read 
the genetic information, and produce RNA molecules. 
Thanks to powerful methods of molecular biology, it has 
been possible to study the number and types of molecules 
involved in the formation of the transcription complex; 
nevertheless, kinetics of the formation of transcription 
complex is much less known, and is now an active area of 
biophysics [2-4] . A key experimental finding [5] regarding 
the kinetics of transcription factors is that at least some 
of the molecules that bind to sites on DNA undergo fa¬ 
cilitated diffusion [G, 7], during which molecules diffusing 
in three-dimensions (3-D) can temporarily bind to the 
DNA, diffusing along the filament and densely exploring 
it, which is thought to be an efficient search strategy [5]. 
Other aspects of faciliated diffusion, such as how it would 
affect the noise in transcriptional regulation, have also 
been explored in previous theoretical work [8, 9]. 

We consider the kinetics of a self-assembly process in 
which a seed molecule diffuses in 3-D, and gets temporar¬ 
ily attached to a filament that carries a target site (see 
Fig. 1 for an illustration of the process and of facilitated 
diffusion). In this work, we do not explicitly model diffu¬ 
sion in 3-D, which has been studied earlier [6], and focus 
on the dynamics of a nucleation process initiated by the 
seed, as described below. While it is associated with the 
filament, the seed molecule undergoes one-dimensional 
(1-D) diffusion and when it occupies the target site, it 
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FIG. 1. (Color online) Illustration of the process of self assem¬ 
bly via facilitated diffusion. In (a), facilitated diffusion of the 
seed molecule is shown, where it performs Brownian motion 
in 3-D (not explicitly modeled in this work) and temporarily 
binds to filaments, diffusing in 1-D (at a rate oc /). When 
the seed arrives at the target site, it binds at rate 6i, and 
if it is already bound, becomes unbound at rate ui. In (b), 
assembly process is shown for w = 4. After the seed is bound, 
additional molecules are recruited via reversible binding. 


becomes bound at a constant rate, triggering the subse¬ 
quent binding of other molecules that bind and unbind 
at constant rates. When w molecules are assembled, the 
process is complete. We consider the case where the seed 
molecule is initially bound at the target site, and focus 
on the kinetics of the rest of the process. Behavior of the 
seed molecule is motivated by the observation of facili- 
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tated diffusion of transcription factors in bacterial cells, 
as mentioned above. We envisage that the seed is an 
essential molecule for transcription initiation which pos¬ 
sesses binding sites for other molecules or induces the 
binding of additional transcription factors, such as the 
RNA polymerase [1-1]- In eucaryotic cells, RNA poly¬ 
merase II (RNAp2) transcribes the majority of genes. 
Although the binding order of molecules that form the 
transcription initiation complex is not clear, experiments 
suggest that a number of transcription factors need to 
bind both before and after the binding of RNAp2 [4]. 
Therefore, if one were to think of RNAp2 as the seed 
molecule, the model presented here would be applicable 
to the latter part of the assembly process, starting with 
the binding of RNAp2. Alternatively, the seed can repre¬ 
sent a molecule that binds during the initial stages of as¬ 
sembly. A candidate for such a molecule is TFIID, which 
significantly changes the local conformation of the DNA, 
paving the way for subsequent molecules to bind [ 1 ]. 

Our main results consist of an exact expression for the 
probability that the assembly completely forms upon the 
binding of a seed molecule, and the distribution of the 
completion time. In addition to these quantities that 
describe the kinetics of assembly, and to link model pre¬ 
dictions with quantities that can be accessed in single 
molecule observations, distributions of the time at which 
the seed dissociates from the filament, as well as the res¬ 
idence time in an interval containing the target site are 
also presented. 

In what follows, we first describe a mathematical 
model corresponding to the process described above and 
illustrated in Fig. 1. We then present results of analytic 
calculations for the quantities mentioned above. Lastly, 
we discuss the applicability of the model as well as previ¬ 
ous findings on the same problem, and state our conclu¬ 
sions. We also provide a comparison of analytical results 
with simulations of the process, presented in Appendix 
E. 


II. AN ANALYTICALLY SOLVABLE MODEL 
OF ASSEMBLY 

A. Model description and assumptions 

We consider the formation of a molecular complex con¬ 
sisting of w elements that are assembled sequentially or 
in random order. Formation of the complex is initiated 
by a seed molecule, which we will just refer to as the 
seed. The seed is envisaged to diffuse in 3-D (not ex¬ 
plicitly modeled) and temporarily attaches to filaments 
along which it undergoes 1-D diffusion. When the seed 
occupies the target site on the filament, it can become 
bound at rate bi. 

After the seed is bound, a total number of w — 1 
molecules start to assemble, which are, so to speak, re¬ 
cruited by the seed. If the seed is the only molecule in the 
complex, it becomes unbound at rate ui , returning to dif- 
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FIG. 2. (Color online) Illustration of the assembly model as 
a Markov chain. Circles and squares correspond to different 
states of the system, and arrows show state transitions with 
corresponding rates. Circles denote different positions on the 
filament, indexed by m, modeled as a lattice, where the seed 
molecule performs a random walk between adjacent sites, with 
a hopping rate /. As shown in the lower right, the random 
walker disappears from the system at a constant rate 7 only 
while it is diffusing along the filament [states indicated by 
circles]. Squares correspond to the bound states indexed by 
i. For sequential and random binding models, the rates &i>i 
and Mi>i are interpreted in a different fashion (see text for 
explanation). 


fusion along the filament. When two or more molecules 
(including the seed) are bound at the same time, the seed 
cannot become unbound. Note that this assumption may 
not be applicable in general, and the model considered in 
this work is appropriate for the case where the binding of 
subsequent molecules stabilizes the complex. While the 
seed is unbound and diffusing on a filament, it dissociates 
at a constant rate 7 . When w molecules are assembled, 
the process is complete. 

In this work, we do not explicitly model the motion 
of the seed in 3-D, and focus on the dynamics when the 
seed molecule is initially bound at the target site. This 
approach allows us to study the kinetics of the assem¬ 
bly process in the presence of a low concentration of seed 
molecules, which is often a good assumption in cell biol¬ 
ogy [ 10 ], accounting for the effect of facilitated diffusion. 
For a treatment of the problem of searching for a target 
site in a filament via 3-D diffusion interrupted by peri¬ 
ods of 1-D exploration, we refer the readers to existing 
literature [7-9, 11, 12]. 

While a cell can contain a large number of transcription 
factors as well as corresponding target sites, we consider 
a regime where the concentration of seed molecules is low 
such that the competition for the target site is negligi¬ 
ble, and the dynamics of the system is well characterized 
by a single seed and a target site. Nevertheless, we do 
provide a generalization for multiple molecules under the 
assumption of negligible competition (see Section HID). 
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B. Mathematical details and predicted quantities 

Based on the biologically inspired model illustrated in 
Fig. 1, we consider a model for diffusion and assembly 
that can be analytically solved by standard tools of sta¬ 
tistical mechanics [13]. See Fig. 2 for an illustration of 
the corresponding mathematical model. 

Diffusion along the filament is modeled as a continuous 
time random walk in a 1-D lattice, where the random 
walker, that is the seed, hops between adjacent lattice 
sites at a (symmetric) rate / (see Fig. 2). For a DNA 
filament, it is natural to think that the lattice sites corre¬ 
spond to base pairs that are separated by ~ 0.34 nm [1]. 
The lattice site with index to = 0 is where the target is 
located. When the seed is occupying site to = 0, it can 
become bound at rate bi, upon which the system would 
transition to the first bound state i = 1. Being in the first 
bound state, the system can either go back to the state 
where the seed is diffusing, at rate ui, or transition to the 
next bound state, f = 2 , if an auxiliary molecule becomes 
bound. Note that while the meaning of the index to is 
straightforward (the position of the seed), the physical 
picture ascribed to the i**' bound state depends on the 
details of how molecules are assembled. We consider two 
different models of assembly: sequential and random. 

In sequential assembly, w — 1 auxiliary molecules can 
reversibly bind in order, once the seed becomes bound. 
When the **** auxiliary molecule binds or unbinds, the 
system transitions to the i + 1 ®* or bound state, re¬ 
spectively. The binding and unbinding rates for the **** 
auxiliary molecule is equal to and iti+i, respectively. 

If auxiliary molecules assemble in random order, 
equally likely in any of the (w — I)! possible ways, the 
model illustrated in Fig. 2 can still be used, provided 
that the binding and unbinding rates of different auxil¬ 
iary molecules are sufficiently similar. We suppose that 
each molecule binds and unbinds at the uniform rates 6 * 
and , respectively. Provided that this is the kinetics at 
the level of individual molecules, the transition rates for 
the whole system would become 6 i>i = {w — i + 1 ) 6 * and 
Mi>i = — 1 )^*, which was also employed in a previous 

study [14]. This is a consequence of treating the reac¬ 
tions involving auxiliary molecules as Poisson processes 
with exponential waiting times. When j of the auxiliary 
molecules are bound, such that the system is in state 
i = j -|- 1 , binding of any of the remaining w — \ — j 
molecules would take the system to the j -I- 2"'^ state, 
or unbinding of any of the j molecules would take the 
system to the state. The former takes place at rate 
{w—j)b:t:, whereas the latter at ju^:, since the minimum of 
a set of independent exponential random variables is also 
distributed exponentially, with a parameter that equals 
the sum of all individual random variables’ parameters. 

In addition to studying the kinetics of the assembly, 
we are also interested in making predictions for observ¬ 
ables relevant in single-molecule tracking experiments. 
Supposing that the seed is labeled (via fluorescent dyes, 
quantum dots, etc.) and its position can be tracked, one 


may be able to observe the time it takes for the seed to 
dissociate from the filament, or the time it takes for it to 
exit from an interval of length 2r centered at the target 
site, given the seed was observed to be bound at t = 0. 
To be able to calculate the statistics of these two times 
from the model, we introduce, for solely mathematical 
convenience, two leaky sites at to = —r and r, where 
the seed disappears from the system at rate k. In the 
next section, we consider the limits «; —>■ 0 and k —> oo 
depending on which calculation is of concern. 

Based on the model illustrated in Fig. 2, we write 
a set of master equations for the probability of finding 
the seed (unbound) at lattice site m, denoted by Pm{t), 
and the probability of finding the system in the bound 
state, denoted by Qi{t), at time t. Note that Qw{t) is the 
probability of having a fully assembled complex at time 
t, from which we will derive the probability of completion 
as well as the first completion time, in the next section. 
The master equation as well as its analytical solution is 
given in Appendix A. In the next section, we present re¬ 
sults derived from the probabilities Pm and Qi , assuming 
that they are known, and always refer to appendices for 
calculation details. 


III. RESULTS 

In this section we present results for the kinetics of 
the assembly process obtained by solving the model il¬ 
lustrated in Fig. 2. Results are displayed in a way that 
highlights the difference between the kinetics for sequen¬ 
tial and random binding models. For convenience, and to 
be able to treat the case of random binding as described 
in the previous section, we consider uniform rates 6* and 
M* as the binding and unbinding rates of each auxiliary 
molecule, regardless of order. 

All rates and times are measured in units of / and 
1 //, respectively, where / is the hop rate between adja¬ 
cent sites in the lattice, proportional to the 1-D diffusion 
coefficient. 


A. Probability of completion 

A key quantity that characterizes the efficiency of the 
assembly process is the probability that the molecular 
complex completely forms before the seed dissociates 
from the filament, given the seed was initially bound 
(that is, (5i(0) = 1, Qi^i{0) = 0 and Pm(0) = 0). We re¬ 
fer to this quantity as the probability of completion, and 
denote it by Pcomp- We note that Pcomp is the probability 
of arriving at the bound state w at any time as t —>■ oo, 
which is also the fraction of system trajectories that end 
at w. Therefore, we have Pcomp = hmt^oo (t), un¬ 
der the condition Uw = 0, ensuring that trajectories that 
reach the last bound state are terminated. Performing 
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FIG. 3. (Color online) Contour plot of the probability of 
completion as a function of A and /3. 


the calculation, we obtain (see Appendix B 2) 

1 


p — 
^ comp — 


1 + 


A(tc) ’ 

1 + /3 


( 1 ) 


where /3 = 6 i/ 1 / 7(7 + 4/) and A(u>) is a constant formed 
by the combination of all the rates bi and Ui except 61 , 
explicitly given in (Bll). If the complex is made up of 
just a pair of molecules, \{w) has a particularly simple 
form, given by A(2) = m. Note that /3 quantifies the 
ratio of the affinity to the binding site (i.e., nucleation 
rate given the particle occupies the binding site) to the 
rate at which the seed is carried away from the binding 
site, either via dissociation or diffusion. When //y 1, 

meaning dissociation is much more rapid than diffusion 
along the filament, we have /3 oc 61 / 7 ; and when //y ^ 1, 
meaning diffusion is much faster than dissociation, j3 oc 

^i/VTy- 

Since A and (3 do not share any parameters, they con¬ 
stitute a good pair of knobs that can be used to inves¬ 
tigate the behavior of Pcomp- In Fig. 3, a contour plot 
of Pcomp is displayed as a function of A and /3. We note 
that it becomes less and less probable for the complex to 
be completed as A increases, or (3 decreases. This is in 
line with what one may expect intuitionally, since larger 
A values correspond to relatively larger unbinding rates 
Mi>i, and smaller /3 values imply that the seed is diffusing 
fast and/or it dissociates from the filament quickly. 

As A and (3 are combinations of many parameters, it is 
informative to explore the behavior of Pcomp ns a function 
of parameters whose physical meaning is more direct. In 
this respect, next, we display how Pcomp changes with 
the total number of molecules in the complex and the 
ratio of the binding and unbinding rates 

In Fig. 4, Pcomp is plotted as a function of w, for five 
different values of indicated by curves with dif¬ 

ferent shades of gray. In this figure and throughout the 
article, dashed and solid curves correspond to random 


FIG. 4. Pcomp as a function of the total number of molecules in 
the complex for different values of the ratio &*/«* (obtained by 
varying u*). Solid and dashed curves correspond to sequential 
and random binding, respectively. Parameter values are: 7 = 
0.1, foi — 2, bf = 0.25, and ui = 1, measured in units of /, 
the hopping rate along the filament. 


and sequential binding models, respectively, unless oth¬ 
erwise noted. We see that Pcomp has greater values for 
random binding compared to sequential binding for the 
same set of parameter values. In random binding, Pcomp 
can be non-monotonic in the number of bound states, 
depending on the value of the ratio 

The behavior of Pcomp as a function of w can be intu¬ 
itively understood, to a certain extent, by ignoring the 
diffusion states and associating m = 0 with dissociation 
(see Fig. 2). Then the problem can be viewed as (bi¬ 
ased) random walk in a I-D lattice with rc -|- 1 sites, 
i = 0, I, ..., w where arriving at the top (site 0) means 
dissociation, and arriving at the bottom (site w) means 
completion, and the random walker starts at site i = 1. 
We take bi = for simplicity. In the next three para¬ 
graphs, we provide an intuitive explanation for the be¬ 
havior observed in Fig. 4. 

In sequential binding, the rate of acquiring and losing 
an auxiliary molecule does not depend on the number of 
already bound molecules. When > 1, the random 

walk is biased downward, leading to the completion of 
the process with a probability that increases with 
(see Fig. 4). On the contrary, for 6 ,/m* < 1, the bias is 
against any motion towards completion. Therefore, the 
only way for the random walker to end up at the bottom 
first is via a highly improbable sequence of downward 
steps, whose probability is expected to dramatically di¬ 
minish with increasing mi, leading Pcomp to zero as a func¬ 
tion of MI. 

In random binding, when n molecules are bound, the 
rate at which the complex grows is given by (w — n)b^:, 
and the rate at which it shrinks is nu ^:. Therefore, there is 
a state with n, = b^:w/{ut + 6 *) bound molecules, which 
is more stable than others in the sense that the growth 
and shrinking rates are approximately balanced. Note 
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that we have rs n^/{w — n*), implying that for 

> 1 we expect to have, on average, more bound 
molecules than missing ones. On the other hand, for 
< 1, we expect to have more missing molecules. 
When > 1, the stable state is in the lower half of 

the simplified lattice, and fluctuations are more likely to 
drive the system to the bottom, i.e. completion. In the 
other case, < 1, the stable state is in the upper 

half, and fluctuations are more likely to take the system 
to the top, i.e. dissociation. This intuitive picture is in 
line with the results shown in Fig. 4. Note that the the 
curve that corresponds to = 0.833 indicates that 

the stable state in this case moves from the lower half to 
the upper half at around w r; 15. 

In the random binding model, when < 1, we 

observe a transient increase in Pcomp, although it even¬ 
tually decays to zero with increasing w. To understand 
this, we note that the completion probability can be ex¬ 
pressed by (1 — p)q, where p is the probability of dis¬ 
sociation during the first transition, and q is the proba¬ 
bility of completion starting from site 2 (as the random 
walker of the simplified problem would end up in state 
2 if it did not dissociate after the first step). Note that 
p = l/(l-|-(tc— l)b^,/u^,), the probability of taking the 
first step upward, decreases with w (the number of aux¬ 
iliary molecules). Therefore, one would expect an initial 
increase in Pcomp due to the increasing factor (1 — p). 
Nevertheless, for < 1, this increase is balanced by a 
decrease in q, as completion before dissociation gets more 
and more improbable as the number of steps required to 
arrive at w increases, as discussed in the previous para¬ 
graph. 

On the whole, we find that there is a qualitative differ¬ 
ence in the behavior of Pcomp for random and sequential 
binding, and that there can be an optimal value of w that 
maximizes Pcomp bi random binding, depending on the 
relative strength of binding and unbinding rates of the 
auxiliary molecules. 

Next, we consider how the probability of completion 
depends on the ratio for different values of the 

total number of molecules. Fig. 5 (a) shows Pcomp as a 
function of 6,/it, for 3 < w < 7. We note that Pcomp 
monotonically increases with for both random and 

sequential binding models. In sequential binding, Pcomp 
does not depend on w for b^/u^: 3> 1. In contrast, the 
values of Pcomp at ^ 1 increase with w, by as 

much as r; 25% in the random binding model. In Fig. 5 
(b), the ratio Pcomp/^ramp: where the superscripts denote 
the binding order of auxiliary molecules, is plotted as a 
function of 5*/it*. We observe that Pcomp is greater for 
the random binding model, especially for 5*/it* > 1, and 
that there is a large drop in the ratio around 6*/it* = 1. 

Lastly, we present results quantifying the effect of fa¬ 
cilitated diffusion on how likely the process is going to 
be completed. Given the seed is initially bound with 
no other bound molecules, setting 6i = 0 would pre¬ 
vent the possibility of rebinding since the seed cannot 
become bound again if it ever transits to the diffusive 


state. Therefore, the ratio p = Pcomp/Pcomp(&i = 0) 
would give us the relative enhancement of completion 
probability due to facilitated diffusion. Note that setting 
/ = 0 maximizes the enhancement due to rebinding, as 
in this case a seed that becomes unbound does not leave 
the binding site (m = 0) via 1-D diffusion, such that 
rebinding occurs at the maximal rate bi. 

The relative enhancement ratio p is explicitly given by 


P = 


1 +A 


1-b 


1 + /3 


( 2 ) 


Expanding (2) around /3 = 0, we get 


P - 1 + (/5^) > 

which clearly shows that there is no enhancement when 
the affinity to binding site is zero, /? = 0 , or when dis¬ 
sociation or diffusion rates diverge, that is, 7 —>■ oo or 
/ —?► 00 , implying /3 —>■ 0. Note that the enhancement 
initially increases linearly with the ratio /3 [defined below 
( 1 ) 1 - 



FIG. 5. Probability of completion as a function of the ratio 
bt/Uf. In (a), Pcomp is plotted for different values of the total 
number of molecules in the complex. Solid and dashed curves 
correspond to sequential and random binding, respectively. In 
(b), ratio of Pcomp for sequential binding to that for random 
binding is displayed. Parameter values for both graphs are: 
7 = 0.1, bi — 1, bt = 0.25, and ui = 1, measured in units of 
/, the hopping rate along the filament. 
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FIG. 6 . (Color online) Contour plot of the enhancement ratio 
due to facilitated diffusion. Sequential and random binding 
models are indicated by circles and squares, respectively, and 
symbols with the same number have identical binding and 
unbinding rates per molecule: u* = 0.25 for 1 and 3; 0.4 for 
2, 4 and 5; 0.5 for 6 . Points on the same curve correspond 
to the same enhancement factor for different values of u*, 
where other parameters are fixed as w = 5, 7 = 0.1, 61 = 2, 
bt = 0.25, and ui = 1, where rates are measured in units 
of /. Data points are color coded in grayscale according to 
their Pcomp values and the corresponding color bar is indicated 
below the graph. 


To demonstrate the behavior of the enhancement fac¬ 
tor, a contour plot of p is displayed in Fig. 6 as a func¬ 
tion of A and /3, plotted in the same domain as Pcomp 
in Fig. 3. We see that facilitated diffusion enhances the 
probability of completion as A and (3 increase. Greater j3 
values correspond to a situation where the seed spends 
greater amount of time bound to the filament, in the 
vicinity of a lattice site, and hence explains the increase 
in p. While the parameter A depends on many model pa¬ 
rameters, it is roughly proportional to Therefore, 

greater A values correspond to relatively larger unbinding 
rates, which hinders the completion of the process both 
with and without diffusion. Higher p values for increas¬ 
ing A suggest that facilitated diffusion can counter this 
hindrance effect to a limited extent. 

In Fig. 6, we also include a number of data points to 
compare the effect of facilitated diffusion in random and 
sequential binding. Sequential and random binding mod¬ 
els are indicated with circles and squares, respectively. 
Data points with the same index correspond to the same 
value of 6* and u*, and hence the same per molecule 
binding/unbinding rates for auxiliary molecules, while 
the rest of the parameters are determined by imposing 
the condition that p is fixed. Each data point is colored 
in grayscale according to its Pcomp value, and the color 
bar is shown at the bottom of the graph. Comparing 
the relative positions of the data points with the same 


index and considering the corresponding Pcomp values, 
we conclude that with the same kinetic constants per 
reaction (6* and w*), random binding has larger prob¬ 
ability of completion; however, achieving the same en¬ 
hancement factor with random binding requires stronger 
association with the filament as compared to sequential 
binding (compare symbols located on the same line). 


B. Completion Time 


In the previous section we demonstrated the behavior 
of the completion probability as a function of a subset of 
the model parameters and found that the order in which 
auxiliary molecules bind has a significant effect on the 
chance of completion, random binding being more effi¬ 
cient. Next, we present model predictions regarding the 
time-dependence of the process, providing the comple¬ 
mentary information to answer the question: given the 
process completes, how long does it take? 

We define Tcomp as the completion time, the time at 
which all w molecules are assembled for the first time, 
given the seed is the only molecule initially bound. Let 
/comp(t) be the probability density function of the ran¬ 
dom variable Pcomp, conditioned on the completion of the 
process. In terms of Qi(t), the conditional distribution 
/comp(t) can be expressed as 


/comp(t) — 


comp 

1 


[f '^w — 0)] 


dQ 'u 


dt 


Uni—O 


where the quantity in square brackets in the first line is 
the survival probability, that is, the probability of not 
having visited the ui**' bound state until time t, and we 
need the normalization constant Pcomp, as /comp is con¬ 
ditioned on the completion of the process. In Appendix 
B2, we show that the Laplace transform of /comp(t) is 
given by 


/comp(e) — 


(-ir 


comp 


Y[Ki{w) 


\z=2 

e 


e + 62 -h uia{e)/ (a(e) + bi) -h U 2 K 2 {w) 


Uw—O 


( 3 ) 


where a(e) = 7/(7 -I- e)(7 + 4:f + e), Ki{w) is a continued 
fraction involving the rate constants [see (AS)], and the 
Laplace transform is defined as 

^ pOO 

/(e) = / dte-^*f{t). (4) 

Jo 

Throughout the text, we will use tildes to distinguish 
Laplace transformed quantities. 
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FIG. 7. Mean completion time as a function of the total 
number of molecules in the complex. In all plots, solid and 
dashed curves correspond to sequential and random binding, 
respectively. In (a), /icomp is plotted on semi-logarithmic axes 
(y), and indicates that the mean completion time asymptot¬ 
ically grows exponentially for random binding. In (b), /Tcomp 
is plotted on log-log axes, and suggests that the mean com¬ 
pletion time asymptotically grows as a power-law, or slower, 
for sequential binding. Parameter values for all graphs are: 
7 = 0.1, bi = 2, b, — 0.25, and ui — 1, measured in units of 
/, the hopping rate along the filament. 

In the following, we first discuss how the mean and 
variance of Tcomp depend on the binding order and model 
parameters and then demonstrate the behavior of the full 
distribution /comp(i)- 

1. Mean and variance of the completion time 

The moment of Tcomp can be calculated from the 
Laplace transform of fcompit) as 

((Tcomp)n = (5) 

which follows from (4). We define ^comp = (Tcomp) and 
CV(Tcomp) = [((Tcomp) ) ~ Mcompl/Mcompj mean 

and coefficient of variance of the completion time. 

Fig. 7 shows ^comp as a function of w, for a set of 
values. Plots in (a) and (b) correspond to identi¬ 
cal parameter values, but the axes are scaled differently. 
Looking at (a), where the y-axis is scaled logarithmically. 


FIG. 8. Coefficient of variance of the completion time for 
sequential and random binding models. Solid and dashed lines 
correspond to sequential and random binding, respectively, 
for various values of the ratio 6*/u*. Parameter values are: 
7 = 0.1, bi = 1, bt = 0.25, and ui — 1, measured in units of 
/, the hopping rate along the filament. 


we first note that ^comp increases exponentially with w 
for the case of random binding (dashed curves). The ex¬ 
ponent increases as b^/uj^ decreases to approach the value 
1 from above, which is simple to grasp intuitionally, as 
higher relative unbinding rates would lead to longer com¬ 
pletion times. When the ratio 6,/u* is below one, results 
show the opposite trend, where the exponent decreases 
with decreasing 6*/it*. This reflects the conditional na¬ 
ture of the completion time. In the previous section, 
we showed that Tcomp approaches 0 for < 1 as w 

increases, meaning that the fraction of trajectories that 
lead to the completion of the process becomes negligible. 
Results in Fig. 7 (a) suggest that while it is quite un¬ 
likely for the process to complete when < 1, the 

trajectories that do lead to completion take shorter and 
shorter times as u* increases (this point will be discussed 
further below). Note that /Tcomp clearly increases slower 
than exponentially for sequential binding (solid curves). 

In Fig. 7 (b), the same data is plotted on logarithmic 
axes. The most prominent feature here is that /Tcomp for 
sequential binding (solid curves) increases as a power law 
for = 1, and slower than a power law for 6*/it* ^ 1. 

The rate of increase as a function of b^/u,: follows a trend 
akin to that for random binding in (a). 

Results above show that the behavior of the average 













completion time is qualitatively different for random and 
sequential binding models as the number of molecules in 
the complex increases. Since completion of the assembly 
is a stochastic process with many steps, one may expect 
significant variance in the values of Tcomp such that a 
typical value may lie far away from /Xcomp- To investi¬ 
gate this, we calculate the coefficient of variance (CV) of 
Tcomp, defined just below (5). Fig. 8 shows CV(rcomp) 
as a function of w for the same set of b^/u^ values in Fig. 
7. As seen in Fig. 8 (a), we find that random and sequen¬ 
tial binding models display approximately the same level 
of variability in Tcomp for the first few w values. How¬ 
ever, for w ^ 2, the coefficient of variance can be sig¬ 
nificantly smaller for sequential binding compared with 
random binding. In random binding (dashed curves), CV 
transiently dips below 1 for a range of w values depend¬ 
ing on but eventually approaches the value 1. In 

sequential binding however, the asymptotic behavior of 
CV depends on the value of 6*/^*, as shown in the log-log 
plot in Fig. 8 (b). 

When = 1, random and sequential binding mod¬ 

els both give rise to a CV that attains a constant value 
as w increases, that is, the standard deviation of Tcomp 
increases at the same rate as its mean. This implies 
that the distribution of completion times remain well- 
dispersed no matter how large the number of auxiliary 
molecules gets. 

In the case of sequential binding, for 1, we 

observe that the CV decays to zero with w. While it is 
not straightforward to provide a simple explanation for 
this behavior for all values of we can gain some 

insight into it by considering the extreme cases ^ 1 

and <C 1, as discussed below. 

When ^ 1, almost all transitions are towards 

completion, meaning that we have mean (Tcomp) ~ 
that is, the number of transitions to completion mul¬ 
tiplied by the average duration of a transition. Since 
the variance of the duration between transitions (towards 
completion) is given by variance of the completion 
time would be var (Tcomp) ~ wb~‘^. Therefore, we expect 
the CV to decay as w~^ in the limit 3> 1, which is 

consistent with the power-law behavior observed in Fig. 
8 (b). 

When 1, we expect the process to reach com¬ 

pletion very rarely, as the complex is much more likely 
to shrink than grow at any state. Nevertheless, when the 
complex does form, the maximally likely system trajec¬ 
tory would consist of w consecutive transitions towards 
completion, since any transition towards the unbound 
state would introduce an additional multiplicative factor 
of in the likelihood of a trajectory that reaches 

completion. Therefore, we expect CV to decay in the 
same way as it does for ^ 1, as explained in the 

paragraph above; the only difference is that completion 
only rarely occurs for <C 1, while it is the typical 

outcome for ^ 1. 

We note that a previous study by D’Orsogna and 
Chou [11] employed a similar model, although in a differ¬ 


ent context (ligand-receptor binding) and without spatial 
extent, and found that random binding results in faster 
mean completion times compared with sequential bind¬ 
ing, except when k, 1, in the absence of any co- 

operativity. Our findings are consistent with this result 
up to a certain value of w, say Wc- As seen in Fig. 7 
(b), the crossover value Wc above which random binding 
leads to longer completion time increases with b^,/ut:. We 
also would like to point out that the presence of diffusive 
states and the possibility of dissociation significantly af¬ 
fects the behavior of Tcomp- 

Overall, we find that sequential binding model results 
in more precise timings compared to the random binding 
model. In addition, as clearly seen in Fig. 7, sequential 
binding is orders of magnitude faster than random bind¬ 
ing for w > 10. Note that the predictions of an assembly 
model is only feasible if the completion time is less than 
the longest time scale in a cell, i.e. duration of the cell 
cycle. 


2. Distribution of the completion time 

Here we demonstrate the full distribution of Tcomp, 
which is obtained by taking the inverse Laplace trans¬ 
form of the expression in (3) (see Appendices B 2 and 
D). 

Fig. 9 (a-c) show /comp(t) for sequential and random 
binding models, indicated by the superscripts “seq” and 
“ran”. All graphs display /comp(t) as a function of t, for 
w = 4, 8, 16 and b^,/u^, = 2.5, 1, 0.625, where different 
b^,/u^, values are color coded. Note that time axes are log¬ 
arithmically scaled and cover a broad range containing 4 
to 6 decades. Looking at (a), we see that /comp(i) has 
a bell-shaped form, whose peak increases with w as one 
would expect. We observe that the distribution is most 
widely spread around its peak when = 1. This 

is consistent with the CV shown in Fig. 8, for w ^ 2, 
where CV for sequential binding attains its maximum at 
= 1, and decreases for other values, implying a 
narrower distribution. In (b), identical parameter val¬ 
ues are used to plot /comp(t)- This time, we note that 
the distribution gets spread over a wide range of t much 
quicker with increasing w. To better visualize this case, 
we display the same data on logarithmically scaled axes 
in (c). We immediately note that the curves for tc = 16 
are now clearly visible, and that the distribution becomes 
significantly uniform over a broad range of t values as w 
increases (for instance, /comp(t) for w = IQ attains the 
value K, 10“"^ over a range k, 10^, implying that almost 
all of the probability is contained in the plateau). This 
behavior is also in agreement with the results obtained 
in Fig. 8, where CV for random binding approaches 1 
with increasing w, indicating that the distribution re¬ 
mains well-spread over a range of t that grows with fj-comp- 
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C. Residence time in an interval 


Using techniques of single molecule microscopy, it is 
possible to directly observe trajectories of individual 
molecules, especially in one and two dimensions. An¬ 
alyzing trajectories that exhibit binding and unbinding 
events, one can extract useful information about reaction 
kinetics at the molecular level [15, 16]. Nevertheless, this 
almost always requires fitting a model to the data. 

If a labeling method can be developed to directly ob¬ 
serve the completion time of a molecular complex, model 
predictions presented so far can be used to fit experi¬ 
mental data and extract kinetic parameters. Neverthe¬ 
less, this would be a challenging task. More often than 
not, there is uncertainty in the number of molecules in 
the complex as well as in determining which molecule(s) 





t 


FIG. 9. Distribution of the completion time for different val¬ 
ues of the total number of molecules and the ratio b*/u*. 
Parameter values are: 7 = 0.1, &i = 2, &* = 0.25, and ui = 1, 
measured in units of /, the hopping rate along the filament. 


would best characterize the completion of the complex. 

In this section, we present model predictions for the 
amount of time the seed spends in an interval of length 
2r centered around the target site, given it was initially 
bound and did not dissociate until the time of measure¬ 
ment. We refer to this time as the residence time, which 
is also the first-passage time of the seed at a distance 
r from the target site, given dissociation is prevented 
by setting 7 = 0. The choice 7 = 0 is justified, as a 
particle cannot be tracked anymore after it dissociates. 
Therefore, all measurements of T^es thus defined corre¬ 
sponds to a model where 7 = 0. Although the infor¬ 
mation contained in the residence time is more indirect 
compared with the information that would be contained 
in the completion time, the residence time is probably 
easier to measure in practice. 

To calculate the distribution of the residence time, de¬ 
noted by fresit)^ we consider the limit k —?► 00 in the 
model illustrated in Fig. 2, which amounts to placing 
perfectly absorbing boundaries at m = —r and r. Sup¬ 
posing that the seed molecule does not dissociate, we can 
calculate the first-passage time at site —r or r, from the 
knowledge of Pm (t) and Qi (t). Details of the calculation 
are presented in Appendix C. Formulas for the mean and 
variance of /res(t) for the first few w are also given in 
Appendix C. 

Fig. 10 shows fres(t) as a function of t for sequential 
(a) and random (b) binding models, for the first few val¬ 
ues of w. We observe that the profile of fres{t) contains 
useful information about the binding model at smaller 
values of r. As r gets larger, distributions with different 
parameter values start to look similar. In the presence 
of measurement errors, this would make it difficult to in¬ 
fer the molecular details about the assembly process via 
measurement of Tres- 

One possible way of directly observing the residence 
time could be achieved by employing nano-materials such 
as DNA origami frames. Two-dimensional frames made 
up of DNA that contain a stretched filament can be ob¬ 
served with atomic force microscopy as well as light mi¬ 
croscopy [17]. This allows making measurements in a 
virtually 2-D space such that the seed molecule would 
not go out of focus while it is bound to the filament. 


D. When multiple seed molecules are present 

We expect the medium to contain multiple seed 
molecules undergoing facilitated diffusion such that the 
overall rate of completion depends on how frequently a 
new seed molecule binds to the target site. By a new 
molecule, we mean any molecule except the one that 
has not dissociated from the filament after becoming un¬ 
bound at the target site (following an incomplete assem¬ 
bly). When the concentration of the seed molecules is 
low, competition among different molecules for the tar¬ 
get site is approximately negligible. In this case, arrival 
of new molecules at the first bound state can be approx- 
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FIG. 10. Distribution of the residence time for sequential and 
random binding models. Each graph shows groups of three 
curves, characterized by the same w value, corresponding to 
three different values of the interval length 2r, measured in 
units of lattice spacing, (a) and (b) show /res(t) as a function 
of t for sequential and random binding, respectively. Param¬ 
eter values are: 7 = 0.1, fei = 2, 6, = 0.25, and ui = 1, 
measured in units of /, the hopping rate along the filament. 


imated by a Poisson process, where the time until arrival 
is distributed exponentially. In other words, the system 
attempts to complete the assembly process at constant 
rate. We would like to remark that this approximation 
was considered in similar contexts in refs. [12] and [9], 
where the low concentration assumption is also discussed 
in the light of biologically relevant values of molecular 
concentrations. 

Let Tdis denote the time at which a seed that started 
in the first bound state dissociates, regardless of whether 
the process completes or not. Distribution of Tdis, de¬ 
noted by /dis is given in Appendix B 1. Next, we define 
/arr(t) and /dis(^) to be distributions of the first arrival 
time, Tarr, of a new seed at the first bound state, and the 
first dissociation time, T^^g, of a seed that was initially 
bound and assuming that the process is not allowed to 
complete, that is, b^, —>■ 0. Note that the time is in¬ 
troduced for convenience, and its role in calculating the 


completion time will become clear in what follows. 

Suppose that initially there are no molecules in the 
vicinity of the target site. After a time Tarr, we ex¬ 
pect a molecule to become bound, which would lead to 
completion with probability Tcomp after a time Tcomp, 
or to dissociation without completion with probability 
1 — Tcomp, after time T^jg. If we call this an attempt, 
then we can formulate the distribution of the first com¬ 
pletion time in terms of the number of attempts that lead 
to the completion of the process. Note that this requires 
(^dis) (2arr), implying that binding of a new molecule 
while another has not yet dissociated from the filament 
is unlikely. 

Let 5comp(t) be the distribution of the completion 
time when the assumption above holds. We can express 

Ilcomp(t) as 


Ilcomp(t) — Tcomp [^arr * ./comp] 

T (1 Tcomp)Tcomp [/arr * /dis * /arr * fct 

T (1 Tcomp)(1 Tcomp)Tcomp 

X [/arr * /dis * /arr * /dis * /arr * /comp] 


( 6 ) 


where * denotes convolution, that is, f * g = /q dsf{t — 
s)g{s). In (6), the first, second and third terms corre¬ 
spond to the probability that the process is completed 
after the first, second and third attempt, multiplied by 
the distribution of the time each route takes. Taking 
the Laplace transform of (6), thereby converting convo¬ 
lutions into products, we obtain 


ilcomp(c) — 


T 


comp/comp (e) 


(1 Tcomp) /dis(^) 


OCl 


-Tc, 


o)/arr(e)/k(e) 


Tc, 


p/arr (e)/comp (c) 


1 (1 Tcomp)/arr(e)/dis(c) 


(7) 


where we assumed the convergence of the geometric sum, 
which certainly holds as e —>■ 0, since 0 < Tcomp < 1, and 

/arr(e) and /dis(e) are Laplace transforms of normalized 
probability distributions. 

Since we assume that the arrival of a new seed is a 
Poisson process, we have /arr(t) = where a is a 

function of the 3-D diffusion coefficient as well as the 
unspecific binding/unbinding rates between the seed and 
the filaments. The distribution /dis(t) can be calculated 
as described in Appendix B 1. 

We can then calculate the mean completion time, start¬ 
ing from the state where no seed is bound, as 


(Tcomp) — ^^^0 


d^comp 

de 


which follows from (5). Substituting (7) into the equation 
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above and performing the limit, we get 


(^comp) — /Tarr “1" /Tcomp 


l-Pr. 


5 \ 


(/^arr “1" ^dis ') 


where /Xarr = (Tlirr) = « ^ and fj^dis' = {Tdis'), where 
the angular brackets imply expected values, as in (5). If 
the complex disappears from the system, then we can 
estimate the rate of formation as 


1 


^comp 


(Tcomp) 


We note that the theory developed by Berg et al. [6], or 
possibly an extension of the widely applicable mean first- 
passage time calculations given by Benichou et al. [18], 
can be used to approximate the association rate a as a 
function of parameters characterizing the whole system, 
including the 3-D diffusion coefficient and the ratio of the 
average inter-filament distance to filament radius. 


IV. DISCUSSION AND CONCLUSIONS 

In this work, we presented a model and its analytic 
solution for the formation of a complex of molecules on a 
filament, applicable to the assembly of the transcription 
complex. The process is driven by seed molecules that 
undergo facilitated diffusion, which consists of 3-D diffu¬ 
sion interrupted by episodes where a molecule associates 
with a filament and undergoes I-D diffusion in search of 
a target site. In this work, we did not explicitly model 
the 3-D diffusion of a seed molecule, which was studied 
earlier [6]. Once the seed molecule becomes bound to the 
target site, a number of auxiliary molecules can reversibly 
bind, forcing the seed to stay bound, until an assembly 
of a given size forms. We believe that including spatial 
degrees of freedom and accounting for the effect of facili¬ 
tated diffusion is the major contribution of this study to 
the existing body of work on the kinetics of aggregation. 

Two quantities were used to characterize the process: 
1) the probability that the assembly completely forms 
and 2) the time it takes for the process to complete, con¬ 
ditioned on completion; once a seed molecule becomes 
bound. Mathematical expressions for these quantities 
are given in (1) and (3), respectively. In Appendix E, 
we also provide a comparison of analytical expressions 
with simulations, verifying the validity of the results. 

Similar to previous studies (see, for instance, [14] and 
[19], where the latter has an experimental component), 
we found that the order in which auxiliary molecules bind 
would matter, and compared two different models where 
auxiliary molecules bind in a strictly sequential order and 
in completely random order. 

The findings indicate that the probability of comple¬ 
tion is greater for random binding than it is for sequen¬ 
tial binding (see Figs. 4 and 5). Interestingly, in random 
binding, when the unbinding rates of auxiliary molecules 
are relatively larger than their binding rates, there can 


be an optimal size for the complex for which the chance 
of completion is maximal (see Fig. 4). While the prob¬ 
ability of completion is larger for the random binding 
model, it can take a long time to reach completion, espe¬ 
cially when the complex contains much more than a few 
molecules. Calculating the completion time distribution 
for the two models, we found that the mean completion 
time grows exponentially with the size of the complex for 
the random binding model, while it grows as a power- 
law or slower for the sequential binding model (see Fig. 
7). In addition, completion time is much more broadly 
distributed for random binding compared to that in se¬ 
quential binding (see Figs. 8 and 9). 

Therefore, there is a trade-off between the probability 
of completion and the completion time, and an optimal 
strategy may consist of an hybrid model, where the first 
few molecules bind in random order to stabilize the form¬ 
ing complex, and the rest of the molecules bind sequen¬ 
tially to reduce the completion time, ensuring that it does 
not scale exponentially with the number of molecules in 
the complex. 

Facilitated diffusion enhances the probability of com¬ 
pletion by increasing the chances for the seed molecule 
to quickly rebind to the target site even if it becomes 
unbound before the process completes (this so-called re¬ 
binding effect was also discussed by other authors, for 
instance, in the context of enzymatic reactions [20], and 
in gene expression [11]). We quantified the amount of en¬ 
hancement by deriving a formula for it as a function of all 
model parameters [see (2)]. Inside the parameter range 
considered here, facilitated diffusion is found to enhance 
the completion probability more strongly for the sequen¬ 
tial binding model compared with the random binding 
model (see Fig. 6). 

Note that the current model can be generalized to 
also let auxiliary molecules undergo facilitated diffusion. 
One possible way of achieving this is generalizing the 
previously studied island growth model [21, 22], where 
monomers adsorb to a surface and undergo diffusion lim¬ 
ited aggregation to form immobile islands, to allow for 
dissociation and re-adsorption of monomers. 

Our results are relevant for the case where the con¬ 
centration of seed molecules is sufficiently low so that 
the competition for the target site is negligible (also see 
refs. [12] and [9]). Under this assumption, we also pro¬ 
vided an approximate analytic expression for the Laplace 
transform of the completion time distribution for multi¬ 
ple seed molecules, from which we obtained the average 
completion time. 

While the model we consider here is appealing for arti¬ 
ficial systems where, for instance, stretched out DNA fil¬ 
aments are placed in nano-engineered structures [17, 23], 
applicability of the model to the formation of the tran¬ 
scription complex in vivo depends on the validity of two 
key assumptions. First, the DNA is assumed to be a 1- 
D filament in the vicinity of a target site (a regulatory 
sequence of a gene). If the dissociation rate 7 is much 
larger than the hop rate /, this assumption is more likely 
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to hold, since the seed would not be able to explore a large 
section of the filament at a time. We should note that 
DNA can be packed inside cells in a highly organized 
manner; it could be condensed by proteins in bacteria, 
and is organized in chromosomes in eukaryotic cells [1]. 
In eukaryotic cells, before transcription of a gene begins, 
the structure of the DNA around the regulatory region 
of the gene loosens up such that transcription factors 
can directly bind to the base pairs. How well this loos¬ 
ened up section of the chromatin can be approximated by 
a filament should eventually be verified by experiments 
in vivo. Second, we assume that the hopping rate along 
the DNA is uniform. Nevertheless, recent studies showed 
that DNA-binding molecules can act like “road blocks” 
that can hinder the 1-D diffusion of transcription fac¬ 
tors [24]. In the presence of molecules that act as road 
blocks as well as non-uniformity in the rate of diffusion 
along different sections of the DNA due to other rea¬ 
sons, the model considered here may underestimate, for 
instance, the variance and higher moments of the com¬ 
pletion time distribution. To improve on this point, one 
can use a more involved model of diffusion in 1-D, al¬ 
lowing for random transition rates, and permeable barri¬ 
ers [25—28], and results derived for transport in random 
environments [29, 30]. 

In this work, we did not investigate the presence of 
cooperativity in binding rates and assumed that the mo¬ 
tion of molecules can be described by Brownian diffusion 
(continuous time random walk with exponential waiting 
times). Inclusion of cooperative binding/unbinding of 
auxiliary molecules and considering anomalous diffusion 
can affect the stability of the complex being formed [14] , 
and lead to correlated bursts in gene expression [11]. 

A key factor that determines the efficiency of search for 
target sites via facilitated diffusion is the relative affinity 
of the seed to unspecific sites on the DNA compared with 
that to the target site. It is worthwhile to note that this 
point becomes even more significant when one accounts 
for the degradation of DNA-binding molecules [31]. In 
our model, the seed hops between all adjacent sites, in¬ 
cluding the target site, at the rate /, and in all of the 
plots we choose bi = 2/, implying that binding to the 
target site happens at only twice the rate at which the 
seed hops between any two sites. Therefore, results il¬ 
lustrated here correspond to the case where the target 
site is not strongly “sticky”. Note that this is consistent 
with experimental findings showing that the probability 
of binding at the first encounter is not necessarily close 
to 1 [-5]. 

In Section III C we presented the model prediction for 


the time it takes for a seed to escape from an interval 
around the target site, provided that it does not dissoci¬ 
ate from the filament until it is observed to escape. Sin¬ 
gle molecule observations can be performed by tagging 
multiple molecules and can be sophisticated enough to 
provide direct information about molecular interactions 
(for a review on multiple experimental methods, see, for 
instance, ref. [3]). Nevertheless, in the most basic setting, 
assuming that the seed molecule can be tracked, the dis¬ 
sociation time and the residence time for the seed can 
be directly accessed, providing evidence for performing 
model selection. However, we remark that the applicabil¬ 
ity of an analysis using the residence time defined above 
would often be limited to observations in the vicinity of 
the binding site, and may require high position precision. 
This is because the in vivo unspecific binding strength of 
molecules to the DNA filament is not necessarily high 
enough for the molecule to cover a large distance on the 
DNA without being dissociated. To provide a quick esti¬ 
mate, we expec t the diffusive displacement of a molecule 
to go as d = (m^) ~ v^/r, where r is the typical time 

a molecule spends attached, which can be estimated as 
r ~ 1/7. Therefore, the distance a molecule covers typ¬ 
ically goes as d ~ \fTJl- Nevertheless, several single 
molecule observations of RNA polymerase in vitro sug¬ 
gest that the dissociation times can be long to allow sig¬ 
nificant motion of the particle along the DNA (see, for 
instance, refs. [32] and [23]). Finally, as also noted in 
Section HIC, we remark that the discriminative power 
of this analysis diminishes as the length of the interval 
increases (see Fig. 10). 

Recent experimental studies on the kinetics of forma¬ 
tion of the transcription initiation complex aim to test 
competing hypotheses of sequential and random binding. 
In this respect, we believe that model predictions, in con¬ 
junction with cutting edge experimental methods, would 
be useful for revealing the dynamics of such mesoscopic 
systems. 
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Appendix A: Master equation and its solution 

In this section we present the mathematical formulation of the model illustrated in Fig. 2 in terms of a set of master 
equations, and its solution. 

We model the seed molecule as a random walker hopping between nearest neighboring sites in a 1-D lattice (see 
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Fig. 2). When the random walker occupies the site m = 0, it can transition to a bound state at rate bi. We consider 
a total number of w bound states, each of which is accessible in a sequential maimer, e.g. a stack of bound states. 
Among the bound states, the rate of transition from state i to i + 1 is denoted by and the rate of transition 
from i to J — 1 is denoted by Ui. When the random walker occupies the first bound state, it becomes unbound at rate 
Ml and returns to the lattice. Note that bi and Ui characterize the rate of going deeper and shallower in the stack of 
bound states, respectively. 

Let Pm(t) denote the probability of finding the random walker unbound at site m and Qi{t) be the probability that 
it is in the bound state. The master equations that govern these probabilities are given by 


dPm 

dt 


— / (^Pm — l P Pm+1 ‘^P'm) ^Pm P ^m,0 ( “t“ MiQi) K {5^ —j- + Pfr. 


where k is a parameter that adjusts the strength of an absorbing boundary at —r and r, and 


(Al) 


— biPo — (mi + 62) Qi P U2Q2, 
dt 

= 62Q1 — {U2 p ^3) Q2 p 


dQw—1 

dt 

dQw 


dt 


bw — lQw — 2 {,'^w — l “t" Qw — 1 “1“ '^yjQ'u 

^wQw — 1 '^wQ w • 


(A2) 


We start with the solution for Pm{t). Let us denote by (pm-nit), the solution of (Al) with the initial condition 
^rn(O) = 5m,m when the terms inside the square brackets are set to zero, which corresponds to random walk in 
an infinite lattice where the random walker disappears at rate 7. Using Laplace and discrete Fourier transforms to 
convert differential equations to algebraic equations, ipm-n(t) can be obtained as 


^m-n{t) = e-(2^+^)‘/^_„(2/t), (A3) 

where Im{t) denotes the modified Bessel function of the first kind [33]. The Laplace transform of (A3) is given by 
(see ref. [34] pp. 75) 


C{e-^H,{bt)) 


[(e + a + 6)i/2-(e + a-6)1/2] 


(2bY (e + a)^ —6^ 


-,1/2 


(A4) 


where, Re[;/] > —1, Re[e] > max{Re[6— a], —Re[6 + a]} and C denotes the Laplace transform defined in (4) with e 
as the Laplace variable. We use tildes (~) to denote Laplace transformed variables as in the main text. To obtain the 
full solution of (Al), we note that it is a first order linear differential equation, which allows us to express its solution 
as 


Pm — ^ ( Pn{^)^m—n 

n 



‘fim-nit - S) [• • • ] (n, s). 


(A5) 


where [• • • ] (n, s) corresponds to the expression in square brackets on the right hand of (Al), as a function of n and 
s. Note that we need to express Qi in terms of Pq in order to obtain a closed equation for Pm's. To achieve this, we 
formally solve the system of equations given in (A2) for an initial condition where the first bound state is occupied 
with probability Qi(0) and all other bound states are initially unoccupied, that is, (5i(0) = 0 for f > 1. Taking the 
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Laplace transform of the system in (A2), and solving recursively, we find 


Qiw — Qw—i 


e-\-u^ 


Qw — l — Qw — 2 ~ 


^w — 1 


e + Uw—i bw — 


Qw—2 — Qw — 3~ 


1 5 

e + Uuj 
bw-2 


e + Um-2 + foui-1 ~ 


'^w—ib'w—1 


e + Uw-i bw — 


Uy 


e + Uy 


Note that we readily have 

Q2 = Qi^^, (A6) 

e + U2 

which can be substituted in the Laplace transform of the first equation in (A2) to obtain an equation that only 
involves Qi and Pq, whose solution is 


where Kj{w) is defined as 


Qi 


gi(o) + biPo 
e + Ml + 62 + K 2 {w)u 2 ’ 


(A7) 


Ki{w) 



Ui j=i 


i-Ujbj) 

(e + Uj + bj+i) 




e + Mi + 6i+i- 

e + Mi+i + &i+2 — 


Ui+lbi+l 

Ui+2bi+2 


(A8) 


Uyybyj 

e “t“ Uyj—i -j- byj ■ 

e + Uyj 

where the big-K notation is one of the convenient ways of denoting parts of continued fractions, defined as [35] 



a* 


bi + 


O.i+1 


bi+i + 


ai+2 


Note that in the limit e —>■ 0, we have 



lim KAw) = - 

e->0 Ui 


Combining all these, we observe that the solution for Qi can be compactly written as 


i=2 


(A9) 


(AlO) 
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for i >2. For w >2, the total probability of being bound is given by 

W 

J2q, = [1- (1 - ifg (... - ))] Q,, (All) 

i=l 


which is obtained by summing up (AlO) and rearranging terms in the summation (the argument w of Ki(w) is omitted 
for brevity in notation). When there is only a single bound state, that is, w = 1, the probability of being bound in 
simply equal to Qi. 

Taking the Laplace transform of (A5), we can obtain an algebraic equation for Pm in terms of the probabilities 
P-r, Pq and Pr, which reads 


Pm — ^’^ra+rP—r ^^m—rPr: 


where we defined 

w , mQi(O) 

™ ™ e + Ml + 62 + A'2(?i’)'U2 

n 

e + Ml + 62 + Ar 2 (M;)u 2 ) 

Substituting m = —r, 0, and r in (A5), we then obtain the following system of linear equations 


1 + Kipo b'_^ Kip-2r 


'P-r' 



Kipr 1 + ^0 Ktp-r 


Po 

= 


Kip2r b'^ 1 + Kifo 


.Pr . 


Ls; J 


which can be solved for P-r, Pq, and Pr to complete the whole solution. 


(A12) 


(A13) 


Appendix B: Probability of completion, first completion and dissociation time distributions 


In this section, we outline the calculation of the probability of completion before dissociation from the filament, 
the first completion time given completion precedes dissociation, and the dissociation time, all for a random walker 
that is initially occupying the first bound state. The appropriate limit in (Al) is k —>■ 0. The full solution for Pm is 
obtained by solving the system of equations in (A13) and substituting the resulting expressions in (A12). The explicit 
solution is given by 


Pm{Pj — Sm + 


'f>m [miQi( 0 ) ~ ^ 1^0 (e + 62 + K2U2)\ 

Ml + (1 + 61^0) (e + &2 + K 2 {w)u 2 ) 


(1 + biifo) (e 

When there is only a single bound state, (Bl) reduces to 


(Bl) 


Pra{Pj — 'P‘m + 


[uiQi{0) - efciSo] 


Ml 


e{l + biipo) 

Note that the expressions above hold for all initial conditions, described by 
where the random walker is initially occupying the first bound state, that is, 'Em 


I and Qi(0). We consider the case 
= 0 and Qi( 0 ) = 1 . 


1. Dissociation time distribution 

We denote the distribution of the dissociation time by fdis(t)- The molecule is allowed to visit any state any number 
of times; therefore, we do not have the restriction Uw = 0 that we used in the calculation for the completion time. In 
this respect, fdis(t) is not a conditional distribution, unlike /comp(^)- The distribution of the first dissociation time 
can formally be written as 

(B2) 


fdis{t) = -^ ( XI W + X 
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where the quantity inside the parentheses is the survival probability, the probability that the random walker is still 
diffusing in the lattice, or the system is in any of the bound states. In the Laplace domain, this expression becomes 


/dis(e) = 1 - e 



(B3) 


The probability of being at any lattice site can be obtained by substituting (fimif-) and summing over m 

p / _ Ml (q + 7 + 4/ + e) _ 

^ ^ (a + 7 + e) [(a + bi) (e + 62 + U2K2{w)) + uia] 

Combining this with (All), the explicit form of the dissociation time is obtained as 

~ , X _ . e (1x1(4/ + 7 + e + a{e)) + ((7 + e)bi + (7 + e + bi) a(e) + Q;(e)^)) 

(7 + e + Q;(e)) {bi (e + 62) + (e + 62 + mi) a{e) + K 2 {w)u 2 {bi + Q!(e))) ’ 


(B4) 


(B5) 


where Ay, is equal to the quantity in the square brackets on the right hand side of (All). 

In order to obtain the distribution of dissociation time given the process never reaches completion before dissociation, 
denoted by /dis(t) in Section HID), one can eliminate the constant b^ and the variable Qw in the set of equations 
(A2) and drop the last equation, thereby removing the possibility of completion. Then, the solution of this modified 
master equation for the dissociation time, performed in the same way as above, would provide /dis(t). 


2. Probability of completion and the completion time distribution 


When the system transitions into the final bound state w, we call the process complete. We can calculate the 
probability of completion and the first completion time by setting u^, the rate of leaving the final bound state, to zero 
and finding out the probability Q™ • Note that this is equivalent to setting = 0 in the master equation and solving 
for Qw With this choice, system trajectories that arrive at the final bound state cannot leave, and the distribution 
of first completion times can be written as 


/comp(t) — 


1 


comp 

1 dQ 


~di ~ f ’ 


comp 


dt 


Uw—0 


(B6) 

(B7) 


where Pcomp is the probability of completion, acting as a normalization constant. Note that the quantity in square 
brackets in (B6) is the survival probability, defined as the probability of not having visited the bound state up 
to time t. In this respect, (B6) is analogous to (B2), except that /comp is a conditional distribution and needs to be 
normalized with the probability of completion, Pcomp- In Laplace domain, we have 


/comp(e) — 


Uw — 0 ) 


(B8) 


since <5^,(0) = 0. Note that this is equivalent to solving the master equations (Al) and (A2) by setting Uw = 0 first, 
and then calculating Qyj, since the solutions (AlO) and (BI) are valid for =0- 


xPc 


comp is the probability of occupancy of state w in the long time limit. Therefore, we have 


comp 


— Qwid 


00; xxu, = 0). To hnd Pcomp we first calculate Qiu(e; Uw = 0) from (AlO). Then, using the limit theorem for the 
Laplace transform, limt_).oo fit) = lime^-o e/(e), we obtain the explicit result 

1 


p — 

J- rnmn — 


1 + A(u;)/(1 + /3)’ 


(B9) 


where 


/3 = h/VWF+W): 

X{w) = [7rf,(2; xx;)]“^ ui(7rf,(3; w) + U2(7rf,(4; w) H-h Uw- 2 i'^hiw\ w) + u^-i) ■■■)), 

W 

TTb{i;w) = 

k=i 


(BIO) 

(Bll) 

(B12) 
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Note that w > 2 and A(2) = ui. 

The Laplace transform of the first completion time valid for w > 2 is obtained from (B8), and is explicitly given by 


^comp(^) 


(- 1 )^ 


Pr, 




e + 62 + uiQ;(e)/ (a(e) + &i) + U 2 K 2 {w) 


(B13) 


Uw—0 


where a{e) = ^(7 + e)(7 + 4/ + e), and Ki{w) is defined in (A8). 


Appendix C: Residence time distribution 


In this section, we are concerned with the case where the random walker is initially occupying the first bound state 
and can be absorbed if it travels far enough from the binding site. We consider the residence time of the random 
walker in a symmetric interval centered around the target site, given the walker does not dissociate from the lattice 
before exiting this interval. To calculate the residence time (equivalently, the first exit or escape time), we consider 
(Al) in the limit k —>■ 00 and 7—^0, which amounts to placing perfectly absorbing boundaries at m = —r and r, and 
eliminating the possibility of dissociation from the lattice. 

After solving for using the same method presented in the previous sections, the Laplace transform of the 
residence time distribution, which we denote by /res, is then found by using (B3). The probability of finding the 
molecule in the unbound state is obtained by summing P^ over all lattice sites, resulting in 




_^i(4/ + e + S{e)) {-{2fY + (2/ + e + S{e)Yf _ 

(e + 5(e)) (mi (( 2/)2’- + /3(e)) 5(e) - (e + 62 + U 2 K 2 {w)) ((2/)2’- (61 - 5(e)) - /3(e) (61 + 5(e)))) 


where /3(e) = ^\/e(4/ + e) + 2/ + ej and 5(e) = ^e(4/ + e). The probability of being in any bound state is given 
by the sum in (All). Substituting these two sums in (B3) and performing the algebra, we obtain 


e ^mi( 4/ + e + 5(e)) ((2/)’’ — (2/ + e + 5(e))’’) + Au,(e + 5(e)) ((2/)^’’ (—61 + 5(e)) + /3(e) (61 + 5(e)))^ 
(e + 5(e)) (ui {{2fY^ + /3(e)) 5(e) + (e + 62 + U 2 K 2 {w)) ((2/)2’' (-61 + 5(e)) + /3(e) (61 + 5(e)))) 

(Cl) 


where A^, is equal to the quantity in the square brackets on the right hand side of (All). We denote the residency 
time as Tres. The moments of Tres can be calculated from 


(Trls) 


(-l)"lim 

£->0 


5 " f 

^ J res 


(C2) 


The mean and variance of Tres, normalized by / ^ and / ^ respectively (/ being the hopping rate along the lattice, 
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proportional to the 1-D diffusion coefficient along the DNA), for the first few values of w are given by 
w = 1 : 


1 

Mres ~ o 1 ^ 


^2 _ 


2 + rbi 
Ui 

3 (2 + rbi)^ + 2 (r + 2r^) biUi + (l + 2r^) ' 

12u? 


w = 2 : 


Mres — 


+ 


(2 + r&i)(l + ^) 


Ul 


CT^es = (I2M1U2) ^ 3 (2 + rbi) ^bl + (3 (2 + rbi) ^ + 2 (r + 2r^) 6 iMi + (l + 2r^) ul)ul 
+ 262 (3 (2 + rbi)'^U 2 + Ul (12 + rbi (6 + U2 + 2r^M2))) 


w = 3 : 


Mres — 2 


+ 


(2 + t- 6 i) 


2^ b2(b3+U3) \ 

U2U3 j 


Ul 


(Tres = {l 2 u\u\u\) ^ (3 (2 + r6i) ^ + 2 (r + 2 r^) biUi + (l + 2 r^) u\) u\u\ + 3 (2 + rbi) ^^2 (^3 + ^3) ^ 
+ 262^6 (2 + rbi)b^ui (63 + U2) + 63 (3 (2 + rbi) ^U2 + Ui (24 + rbi (12 + U2 + 2r'^U2))) U3 
+ (3 (2 + rbi) ^U2 + Ul (12 + rbi (6 + U2 + 2r^U2))) ^3) , 
where r is considered dimensionless, an integer corresponding to the number of lattice sites from the target site. 


Appendix D: Numerical inverse Laplace transform 

Based on previous experience, we pick the Gaver-Stehfest method [36, 37] for the numerical inversion of Laplace 
transforms, which only requires the evaluation of the transformed function at real values of the Laplace variable and is 
suitable for bounded functions such as first-passage time distributions. We employ the algorithm described by Abate 
and Whitt [38] to approximate the inverse Laplace transform of /(e) as 

fit) 


Wk 

where M is a positive integer and [(fc + 1)/2J means the largest integer less than or equal to (fc + l)/2. The number 
M is chosen based on the available numerical precision, and according to the estimate provided in ref. [38] , the result 
has around 2.2M digits of precision. 




t 




min(/c,M) 

_ ^i: 

i=L(fe+l)/2j 


2j! 


(j!)2(A/-j)!(2j-fc)!(A:-j)!’ 


Appendix E: Comparison with simulations 

Assembly formation process is simulated according to the model description given in Section II, using the Gillespie 
algorithm [39] that proceeds by determining the time of the next transition in the system, as well as which transition 
is going to take place. The system state is then updated, and the process is repeated until the random walker: 1) 
arrives at the bound state w, 2) decays (at rate 7 while at an unbound state), or 3) reaches one of the sites —r or 
r, for the computation of TJ-ompi 2dis and T^es^ respectively. Pcomp is obtained by computing the ratio of the number 
of times the assembly forms to the total number of independent simulation runs. In each simulation run, the system 
















19 


starts at the same state where the random walker is occupying the first bound state with certainty, in accordance 
with the model description in Section II. 



W 

P-5.P [Eq. (1)] 

-Pcomp (sim) 

[Eq. (1)] 

Ecomp (sim) 

0.10 

2 

0.5076 

0.5078 [0.5071, 0.5086] 

0.5076 

0.5084 [0.5076, 0.5092] 

0.40 

3 

0.2839 

0.2840 [0.2832, 0.2847] 

0.4423 

0.4420 [0.4413, 0.4429] 

0.08 

4 

0.4202 

0.4206 [0.4199, 0.4213] 

0.7101 

0.7105 ]0.7099, 0.7111] 

0.19 

5 

0.2708 

0.2699 [0.2694, 0.2705] 

0.6863 

0.6859 [0.6852, 0.6866] 

0.13 

6 

0.3397 

0.3398 [0.3394, 0.3402] 

0.8007 

0.8009 ]0.8005, 0.8014] 

0.31 

7 

0.0858 

0.0858 [0.0855, 0.0861] 

0.5531 

0.5527 [0.5520, 0.5534] 

0.15 

8 

0.2979 

0.2983 [0.2976, 0.2990] 

0.8571 

0.8570 ]0.8566, 0.8575] 


TABLE I. Comparison of numerically exact and simulated values of Pcomp. Values of u* and w are indicated in the first two 
columns, and all other parameter values are fixed at: f = 1, bi — 2, ui = 1, fa* = 0.25, 7 = 0.1. In computing simulation 
results, an ensemble of V = 10® independent simulation runs were obtained, and this process was repeated 20 times to compute 
error due to finite sample size. Pcomp (sim) and Pcomp (sim) correspond to simulation results, given in the format x [y, z], where 
X is the average, and [y, z] is the 95% confidence interval computed via bootstrapping. 


In Table I and Fig. II, we display the comparison between analytical and simulation results for Pcomp and the 
cumulative distributions of Tcomp, Tdis and Pres, for several randomly selected parameter values. The cumulative 
probability distribution of T^,, which is straightforward to calculate from simulation data without any binning, is 
defined as 



dsf^{s) = C ^ 



(El) 


where y is either “seq” or “ran”, for sequential and random binding models, respectively, and the inverse Laplace 
transform, denoted by C~^, is performed as in Appendix D. Parameter values are given in captions. As both sets 
of comparisons show, numerically exact and simulation results that are obtained independently are in excellent 
agreement. 
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FIG. 11. (Color online) Comparison of numerically exact and simulated cumulative probability distributions of Tcomp, Tdis and 
Tres, defined in (El). Curves correspond to the numerical inverse Laplace transform of (El) (with the use of Eqs. (B13), (B5) 
and (Cl)), and black filled circles along with error bars represent simulation results. Legends of the topmost graphs apply to 
all graphs in the same column, and values in parenthesis correspond to (6*/u*, w). Eor the calculation of Fles, we set r = 10. 
Rest of the parameter values are the same as those in the caption of Table. 1. In computing simulation results, an ensemble 
of = 10^ independent simulation runs were obtained, and this process was repeated 20 times to compute error bars. Error 
bars correspond to 95% confidence intervals computed via bootstrapping. 
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